On weight initialization in deep neural networks
نویسنده
چکیده
A proper initialization of the weights in a neural network is critical to its convergence. Current insights into weight initialization come primarily from linear activation functions. In this paper, I develop a theory for weight initializations with non-linear activations. First, I derive a general weight initialization strategy for any neural network using activation functions differentiable at 0. Next, I derive the weight initialization strategy for the Rectified Linear Unit (RELU), and provide theoretical insights into why the Xavier initialization is a poor choice with RELU activations. My analysis provides a clear demonstration of the role of non-linearities in determining the proper weight initializations.
منابع مشابه
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
It is well known that the initialization of weights in deep neural networks can have a dramatic impact on learning speed. For example, ensuring the mean squared singular value of a network’s input-output Jacobian isO(1) is essential for avoiding the exponential vanishing or explosion of gradients. The stronger condition that all singular values of the Jacobian concentrate near 1 is a property k...
متن کاملDeep Jointly-Informed Neural Networks
In this work a novel, automated process for determining an appropriate deep neural network architecture and weight initialization based on decision trees is presented. The method maps a collection of decision trees trained on the data into a collection of initialized neural networks, with the structure of the network determined by the structure of the tree. These models, referred to as “deep jo...
متن کاملKernel Reparametrization Trick
While deep neural networks have achieved state-of-the-art performance on many tasks across varied domains, they still remain black boxes whose inner workings are hard to interpret and understand. In this paper, we develop a novel method for efficiently capturing the behaviour of deep neural networks using kernels. In particular, we construct a hierarchy of increasingly complex kernels that enco...
متن کاملCystoscopy Image Classication Using Deep Convolutional Neural Networks
In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...
متن کاملDeep Residual Networks and Weight Initialization
Residual Network (ResNet) is the state-of-the-art architecture that realizes successful training of really deep neural network. It is also known that good weight initialization of neural network avoids problem of vanishing/exploding gradients. In this paper, simplified models of ResNets are analyzed. We argue that goodness of ResNet is correlated with the fact that ResNets are relatively insens...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1704.08863 شماره
صفحات -
تاریخ انتشار 2017